Learning Speech Emotion Representations in the Quaternion Domain

نویسندگان

چکیده

The modeling of human emotion expression in speech signals is an important, yet challenging task. high resource demand recognition models, combined with the general scarcity emotion-labelled data are obstacles to development and application effective solutions this field. In paper, we present approach jointly circumvent these difficulties. Our method, named RH-emo, a novel semi-supervised architecture aimed at extracting quaternion embeddings from real-valued monoaural spectrograms, enabling use quaternion-valued networks for tasks. RH-emo hybrid real/quaternion autoencoder network that consists encoder parallel classifier decoder. On one hand, permits optimization each latent axis classification specific emotion-related characteristic: valence, arousal, dominance, overall emotion. other reconstruction enables dimension develop intra-channel correlations required representation as entity. We test our on tasks using four popular datasets: IEMOCAP, RAVDESS, EmoDB, TESS, comparing performance three well-established CNN architectures (AlexNet, ResNet-50, VGG) their equivalent fed created RH-emo. obtain consistent improvement accuracy all datasets, while drastically reducing resources' models. Moreover, performed additional experiments ablation studies confirm effectiveness approach. repository available at: https://github.com/ispamm/rhemo .

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Corpus-Invariant Discriminant Feature Representations for Speech Emotion Recognition

As a hot topic of speech signal processing, speech emotion recognition methods have been developed rapidly in recent years. Some satisfactory results have been achieved. However, it should be noted that most of these methods are trained and evaluated on the same corpus. In reality, the training data and testing data are often collected from different corpora, and the feature distributions of di...

متن کامل

Variational Autoencoders for Learning Latent Representations of Speech Emotion

Learning the latent representation of data in unsupervised fashion is a very interesting process that provides relevant features for enhancing the performance of a classifier. For speech emotion recognition tasks, generating effective features is crucial. Currently, handcrafted features are mostly used for speech emotion recognition, however, features learned automatically using deep learning h...

متن کامل

A New Method for Speech Enhancement Based on Incoherent Model Learning in Wavelet Transform Domain

Quality of speech signal significantly reduces in the presence of environmental noise signals and leads to the imperfect performance of hearing aid devices, automatic speech recognition systems, and mobile phones. In this paper, the single channel speech enhancement of the corrupted signals by the additive noise signals is considered. A dictionary-based algorithm is proposed to train the speech...

متن کامل

the analysis of the role of the speech acts theory in translating and dubbing hollywood films

از محوری ترین اثراتی که یک فیلم سینمایی ایجاد می کند دیالوگ هایی است که هنرپیش گان فیلم میگویند. به زعم یک فیلم ساز, یک شیوه متأثر نمودن مخاطب از اثر منظوره نیروی گفتارهای گوینده, مثل نیروی عاطفی, ترس آور, غم انگیز, هیجان انگیز و غیره, است. این مطالعه به بررسی این مسأله مبادرت کرده است که آیا نیروی فراگفتاری هنرپیش گان به مثابه ی اعمال گفتاری در پنج فیلم هالیوودی در نسخه های دوبله شده باز تولید...

15 صفحه اول

Machine Learning Methods in the Application of Speech Emotion Recognition

Machine Learning concerns the development of algorithms, which allows machine to learn via inductive inference based on observation data that represent incomplete information about statistical phenomenon. Classification, also referred to as pattern recognition, is an important task in Machine Learning, by which machines “learn” to automatically recognize complex patterns, to distinguish between...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE/ACM transactions on audio, speech, and language processing

سال: 2023

ISSN: ['2329-9304', '2329-9290']

DOI: https://doi.org/10.1109/taslp.2023.3250840